As an important variant of entity alignment (EA), multi-modal entity alignment (MMEA) aims to discover identical entities across different knowledge graphs (KGs) with multiple modalities like images. However, current MMEA algorithms all adopt KG-level modality fusion strategies but ignore modality differences among individual entities, hurting the robustness to potential noise involved in modalities (e.g., unidentifiable images and relations). In this paper we present MEAformer, a multi-modal entity alignment transformer approach for meta modality hybrid, to dynamically predict the mutual correlation coefficients among modalities for instance-level feature fusion. A modal-aware hard entity replay strategy is also proposed for addressing vague entity details. Extensive experimental results show that our model not only achieves SOTA performance on multiple training scenarios including supervised, unsupervised, iterative, and low resource, but also has limited parameters, optimistic speed, and good interpretability. Our code will be available soon.
translated by 谷歌翻译
Optimal Transport (OT) provides a useful geometric framework to estimate the permutation matrix under unsupervised cross-lingual word embedding (CLWE) models that pose the alignment task as a Wasserstein-Procrustes problem. However, linear programming algorithms and approximate OT solvers via Sinkhorn for computing the permutation matrix come with a significant computational burden since they scale cubically and quadratically, respectively, in the input size. This makes it slow and infeasible to compute OT distances exactly for a larger input size, resulting in a poor approximation quality of the permutation matrix and subsequently a less robust learned transfer function or mapper. This paper proposes an unsupervised projection-based CLWE model called quantized Wasserstein Procrustes (qWP). qWP relies on a quantization step of both the source and target monolingual embedding space to estimate the permutation matrix given a cheap sampling procedure. This approach substantially improves the approximation quality of empirical OT solvers given fixed computational cost. We demonstrate that qWP achieves state-of-the-art results on the Bilingual lexicon Induction (BLI) task.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
In knowledge graph completion (KGC), predicting triples involving emerging entities and/or relations, which are unseen when the KG embeddings are learned, has become a critical challenge. Subgraph reasoning with message passing is a promising and popular solution. Some recent methods have achieved good performance, but they (i) usually can only predict triples involving unseen entities alone, failing to address more realistic fully inductive situations with both unseen entities and unseen relations, and (ii) often conduct message passing over the entities with the relation patterns not fully utilized. In this study, we propose a new method named RMPI which uses a novel Relational Message Passing network for fully Inductive KGC. It passes messages directly between relations to make full use of the relation patterns for subgraph reasoning with new techniques on graph transformation, graph pruning, relation-aware neighborhood attention, addressing empty subgraphs, etc., and can utilize the relation semantics defined in the ontological schema of KG. Extensive evaluation on multiple benchmarks has shown the effectiveness of techniques involved in RMPI and its better performance compared with the existing methods that support fully inductive KGC. RMPI is also comparable to the state-of-the-art partially inductive KGC methods with very promising results achieved. Our codes and data are available at https://github.com/zjukg/RMPI.
translated by 谷歌翻译
基于多模式方面的情感分类(MABSC)是一项新兴的分类任务,旨在将给定目标的情感分类,例如具有不同模式的数据中提到的实体。在带有文本和图像的典型多模式数据中,以前的方法不能充分利用图像的细颗粒语义,尤其是与文本的语义结合在一起,并且不完全考虑对细粒图像之间的关系进行建模信息和目标,这导致图像的使用不足和不足以识别细粒度的方面和意见。为了应对这些局限性,我们提出了一个新的框架SEQCSG,包括一种构建顺序跨模式语义图和编码器模型的方法。具体而言,我们从原始图像,图像标题和场景图中提取细粒度的信息,并将它们视为跨模式语义图的元素以及文本的令牌。跨模式语义图表示为具有多模式可见矩阵的序列,指示元素之间的关系。为了有效地利用跨模式语义图,我们建议使用目标提示模板的编码器解码器方法。实验结果表明,我们的方法优于现有方法,并在两个标准数据集MABSC上实现了最新方法。进一步的分析证明了每个组件的有效性,我们的模型可以隐含地学习图像的目标和细粒度信息之间的相关性。
translated by 谷歌翻译
视觉问题回答(VQA)通常需要对视觉概念和语言语义的理解,这取决于外部知识。大多数现有方法利用了预训练的语言模型或/和非结构化文本,但是这些资源中的知识通常不完整且嘈杂。有些方法更喜欢使用经常具有强化结构知识的知识图(kgs),但是研究仍然相当初步。在本文中,我们提出了Lako,这是一种知识驱动的VQA方法,通过后期的文本注射。为了有效地纳入外部kg,我们将三元三元转移到文本中,并提出一种晚期注射机制。最后,我们将VQA作为文本生成任务,并具有有效的编码器范式。在使用OKVQA数据集的评估中,我们的方法可实现最新的结果。
translated by 谷歌翻译
视频录制是一种广泛使用的方法,用于记录研究和临床实践中的婴儿和儿童行为。由于机密性的道德问题,尽管需要共享的大规模数据集的需求仍在增加,因此很少共享视频数据。当涉及基于数据驱动的计算机的方法,例如筛选工具以补充临床评估时,这种需求更加必要。要在遵守隐私保护规则的同时共享数据,是否会出现一个关键问题,这是否会减少数据实用程序?我们通过展示PrechTL的一般运动评估(GMA)来解决这个问题,该评估是一种既定的,全球实践的基于视频的诊断工具,用于早期婴儿,用于检测神经系统缺陷,例如脑瘫。迄今为止,尚无针对婴儿运动分析的共享专家注销的大数据存储库。这样的数据集将大大受益于人类评估者的培训和重新校准以及基于计算机的方法的发展。在当前的研究中,来自前瞻性纵向婴儿队列的序列,总共有19451年可用的通用运动视频片段被随机选择用于人类的临床推理和基于计算机的分析。我们首次证明,通过脸部视频录制的伪造是一种可行的方法。视频修复不影响人类评估者或计算机视觉方法的分类精度,这表明有足够且易于应用的解决方案用于共享运动视频数据。我们呼吁进一步探索有效和隐私规则的方法,以在运动评估以外的科学和临床领域去识别视频数据。这些方法应使共享并将独立视频数据集合并到大型数据库中,以提高科学和公共卫生。
translated by 谷歌翻译
零击学习(ZSL)旨在预测看不见的课程,其样本在培训期间从未出现过,经常利用其他语义信息(又称侧信息)来桥接培训(见过)课程和看不见的课程。用于零拍图像分类的最有效且最广泛使用的语义信息之一是属性,是类级视觉特征的注释。但是,由于细粒度的注释短缺,属性不平衡和同时出现,当前方法通常无法区分图像之间的那些微妙的视觉区别,从而限制了它们的性能。在本文中,我们提出了一种名为Duet的基于变压器的端到端ZSL方法,该方法通过自我监督的多模式学习范式从审前的语言模型(PLM)中整合了潜在的语义知识。具体而言,我们(1)开发了一个跨模式的语义接地网络,以研究模型从图像中解开语义属性的能力,(2)应用了属性级的对比度学习策略,以进一步增强模型对细粒视觉特征的歧视反对属性的共同出现和不平衡,(3)提出了一个多任务学习策略,用于考虑多模型目标。通过对三个标准ZSL基准测试和配备ZSL基准的知识图进行广泛的实验,我们发现二重奏通常可以实现最新的性能,其组件是有效的,并且其预测是可以解释的。
translated by 谷歌翻译
过去的几年见证了基于变压器的模型的成功,其规模和应用方案继续积极发展。变压器模型的当前景观越来越多样化:该模型大小差异很大,最大的参数是最大的。模型特性由于特征的混合物所引入的稀疏性而有所不同。目标应用程序方案可以是关键延迟或面向吞吐量的情况;部署硬件可以是具有不同类型的内存和存储等单身或多GPU系统。随着多样性的增加和变压器模型的快速发展速度,设计高性能和高效的推理系统非常具有挑战性。在本文中,我们提出了DeepSpeed推断,这是用于解决上述挑战的变压器模型推理的全面系统解决方案。深速推理包括(1)一种多GPU推理解决方案,可最大程度地减少潜伏度,同时最大化密集和稀疏变压器模型的吞吐量,当它们适合聚集的GPU内存时,以及(2)一种异质推理解决方案,该解决方案利用CPU和NVME内存中的CPU和NVME内存。除了GPU内存和计算以使高推理吞吐量具有不适合聚集GPU内存的大型推理吞吐量。对于面向延迟的方案,深速推理可将延迟降低到最新的7倍,而对于面向吞吐量的方案,延迟的潜伏期将延迟减少到1.5倍以上。此外,它通过利用数百个GPU来实现实时延迟约束下的参数量表推断,这是一个前所未有的推理。它可以比仅使用GPU的解决方案更大的25倍模型,同时提供84个TFLOPS(超过50美元的A6000峰值)。
translated by 谷歌翻译
知识图(kg)及其本体论的变体已被广泛用于知识表示,并且已证明在增强零拍学习(ZSL)方面非常有效。但是,利用KGS的现有ZSL方法都忽略了KGS中代表的类间关系的内在复杂性。一个典型的功能是,一类通常与不同语义方面的其他类别有关。在本文中,我们专注于增强ZSL的本体,并建议学习以本体论属性为指导的解剖本体嵌入,以捕获和利用不同方面的更细粒度的类关系。我们还贡献了一个名为dozsl的新ZSL框架,该框架包含两个新的ZSL解决方案,分别基于生成模型和图形传播模型有效地利用了分解的本体学嵌入。已经对零摄像图分类(ZS-IMGC)和零射Hot KG完成(ZS-KGC)进行了五个基准测试进行了广泛的评估。 Dozsl通常比最先进的表现更好,并且通过消融研究和案例研究证实了其组成部分。我们的代码和数据集可在https://github.com/zjukg/dozsl上找到。
translated by 谷歌翻译